Ambient LED mode: some optimizations and bugfixes#645
Conversation
|
Thanks for picking something thats probably too far down on my list for the forseeable future, much appreciated! :) A little NEON might go a long way as well, but you'll have to see if you can find a good way to combine that with your manual downsampling approach (without having to copy stuff around). If you blend the result across multiple frames, you could also think about modulating the sample indices you end up using to make sure you catch all possible colors on static content. |
|
Are these values the same across all devices? Or will we have to eventually move some of this to platform.c? |
|
I hate magic numbers. What is 1/2/5? I'm assuming its either mode or area? |
mode appears to be the integer value of ambient mode setting itself, i.e. which LEDs to animate. (All, Top, FN, etc...). Should probably be an enum. lightsambient[x] is nextui's index for the lights - is this consistent across all supported platforms or do we have to factor this part out to platform.c? |
|
Hm, we could either do a generic approach and designate an enum that can be used as a bitmask, or keep it as simple as possible. The whole thing could be a lot more elegant, depends on how much time you want to spend. |
|
Going to submit this as-is for now, maybe I'll take another look in the future :) |
06c1c5a to
0be4c4b
Compare
…ame to reduce ensuing flicker
…n RGB565 and RGB888 cores
…ve effects of scrolling 2D tiles
0be4c4b to
35e2436
Compare
I'm not really married to any of these ideas, but the thing about spending lots of CPU time calculating the color of LEDs seems worth a review.
Remove the second for loop - sometimes this function does a whole second pass over every pixel which doubles our CPU time
Don't use fminf() and fmaxf() because this casts to float internally - writing out min and max the tedious way is roughly 16% faster for free
Don't sample every pixel; blend the resulting color with the previous frame to reduce noisiness.
The last one of these has by far the most impact on the runtime of the function, directly proportional to the number of pixels skipped. I can imagine a worst-case scenario for skipping pixels would be a scrolling grid pattern. With the way that retro tile-based graphics work I have a hunch that scanning every 7 lines instead of 8 would be a simple and effective way to mitigate this - I haven't tested this yet. I haven't looked into the possibility of SIMD here btw, what do you guys think?
on CPU Speed Normal here is what I profiled (MGBA, 1000 frame test sequence, take exact numbers with a grain of salt)
881us - base performance (depends on test sequence - this was not a worst-case scenario w.r.t. the extra for loop)
830us - remove inner for loop
696us - don't use fminf() and fmaxf()
16us - only check every 8 pixels in both x and y directions (?!?)
Open to feedback as usual :)